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(54) A method and system for suggesting related documents 



(57) The document reading system passively ana- 
lyzes a document to geneiBte margin or end notes oC 
references to other documents that relate to annotated 
passages in the document or to the entire document. 
The invention is responsive to the annoialk)n of a doc- 
ument to passively generate a query that retrieves doc- 
uments that have similar conteni to the annotated pas- 
sage. The retrieved documents are available to the 



reader through selectable links placed in the mar^ 
near the anr>otation. Additior>ally, the invention provkles 
end notes with Rnks to documents that are similar in corv 
tent to the overall content of the anndated document 
The invention assists the reader by passively generating 
selectable links to related documents lo assist the user 
in relating the new document to prevkMsly read materi- 
al. 
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Description 

[0001] This invention relates generally to elecUonic 
document reading systenns. In particular, this invention 
is directed lo an electronic document reading system 
that suggests other related documents when displaying 
a first document 

[0002] Retrieving documents simitar to a document 
identified by the user as being related is known as rele- 
vance feedback. Relevance feedback is described in 
•Introduction to N4odem Inlormalion Retrieval", G. Sall- 
on et aL. McGraw Hill, (1983). incorporated herein by 
reference in its entirety. Interfaces thai support rele- 
vance feedback oonventionally require explicl action on 
the part of the reader and do not spontaneously offer 
suggestKsns of relevant documents. Information expk>- 
ralton interlaces designed lor wnvtow-hased computing 
environments typically present search results for other 
relevant documents via lists in a separate window or by 
replacing the visible document with the search results. 
These systems are very intrusive and interrupt the read- 
ing process. 

[0003] Hypertext interfaces display links lo doci>- 
merits relevant to a source document either by providing 
a margin that contains the links or by embedcfing the 
links in the text of the source document in the manner 
pioneered by "Hyperties." This system is described in 
•User Interlace Oestgn for the Hyperties Etectronc En- 
cychDpedia'. by Shneiderman. Proceedg^qs of Hypertext 
J7. IMovember 1987. Chapel Hill. NC. incorporated 
herein by referervre in its entirety. However, these Inks 
are static and are created along with the source docu- 
nuent by tlie hypertext author. Some systems, such as 
Trenis, display links dynamically, but only from a fUed 
set of previously-derined links. Trellis is described in 
"Programmable Browsing Semaniics and Trellis*, by R 
Furuta et al. Proceedings of Hypertext B9 , Noveml>er 
1989. Pfttsburgh, PA. ACM Press, incorporated herein 
by reference in its entirety. 

[0004] The HieNet System uses inter-node similarity 
measures to create hypertext Knks based on finks pre- 
viously created by the hypertext author. This system is 
described in *Hienet: A User-Centered Approach for Au- 
tomatic Link Generation*. D.T Ch»ig. Proceedinos cA 
Hypertext ^ November 1993. Seattle. WA, ACM 
Press, Incorporated herein by reference h Its emiral>i 
When the author creates a fink from a document A to a 
dbcumenl B. the system aulomaticafly adds links from 
all documents similar to document A to all documents 
similar to document B. Anchors for these automatically- 
generated folks are represented by cons in the margin 
of the various documents. Clicking on an icon displays 
a pop-up menu that contains a list of poeeible deslir^- 
tkm documents that are ranked by relevance to the (|ue* 
ry. Again, this system relies on links previously created 
by the author. 

[0005] Other conventional systems relate to hyper- 
text-like ways of displaying search results. HieNet dis- 



plays automatic links in the margin, but anchors in the 
margin are not relev>.^f to the content of the passage 
adjacent to the arK:h. leNei r i. -'js not distinguish be- 
tween document-doi. . ient aro passage-doctiment 
s finks. Furthermore, Hiei Jet does not iruiicate the number 
and nature of the documents reachable through tfie 
margin links. 

[OOOq Visualizatkxi of Information Retrieval System 
(hereinafter VOIR) is described in •Queries? Links? Is 

'0 There a Difference?*. Proceedings of CHI *97. G. 
Gdovinsky. March 1997. Atlanta. GA. ACM Press and 
in *What the Query TokJ the Unk: The Integration of Hy- 
pertext and Information Retrieval*. Proceedgios of Hy- 
pertext *97 . G. Gotovinsky. April 1997, Southhampton. 
UK. ACM Press, each ffKX>rporated herein by refererK^e 
in its entirety. VOIR is a mechanism that dynamically 
creates and resolves hypertext links with queries that 
are computed from the text surrourxtlng a selected arv 
chor. VOIR uses queries to retrieve sets of documents 

20 thai are related to the passage containir>g the selected 
anchor. VOIR does not show the user links that have 
pre-established relationships. Rather, lo submit a query 
and to establish a relationship, the user has to pause 
and select an anchor. VOIR was designed specrfksBy 

^ . to support interactive information expksrafion, rather 
than to facaitate the reading procese. Thus. VOIRs fo- 
cus is supporting navigation between documents. The 
user is thus expected to devote much cognitive effort to 
browsing. Furlhemrxjre. VOIR does riol pennrut the user 

30 to annotate or tag documents. VOIR also does not irwJi- 
cate which link was selected to generate a particular dis- 
play. 

[0007] A k>ackground information retrieval process 
caOed the Remembrance Agent (hereinafter RA) is de- 

35 scribed in 'A Continuously Running Automated Informa- 
tion Retrieval System*. B.J. f=^odes el al ProceedinQs 
(rf The First International Conference on the Practical 
Aoplcaton of IntelliqeBt Agents in MuttnAoent Tetrrjol- 
ogjt PAAM "gs. April, 1997, London. UK, lncofpe:<ri9d 

<o herein by reference in its entirety. FHA operates fft an 
EMACS text window and suggests documents related 
to the last few lines of text typed by the user. F)A is de- 
signed to search through a user's private data to sug- 
gest doctflnents related to the text being typed. Howev^ 

45 er, these suggestions are ephemeial and relate only lo 
text that Is currently being written. BA does not support 
reacftng tasks because' it continuously replaces sugges- 
tions as the user edits the dcxument 
[0008] QRL is a c^ery-based kif ormation exploratkm 

^ interlace that uses ink-like marks on text to specSy 
boolean emeries. This system is described in ^eries- 
IH-Unks: GraphKal Markup lor Text Navigatcn*. by G. 
Gokivchinslcy et al.. Proceedinqs of INTERCHI April 
1993, Amsterdam, The Netheriands, ACM Press, incor- 

ss poraled heren by reference in its entirety Query lenns 
are selected rAttx rectar>g!es. Lines connect the rectan- 
gles to represent boolean AND operators. 
[0009J AH of these systenrts require extensive user in- 
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teradion to generate links to related documents or only 
support wrHing. An etectronk: document reading system 
is needed that passively and unobtrusively generates 
links to related documents to support reading. 
[001 0] This invention provides a method and a system 
lor passively showmg the reader related documents 
without interlering with the reading process. 
[001 1) The invention further provides intuitive support 
for reading by automaticatly detecting documents po- 
tentially of Interest to the reader based on the reader's 
interaction with the source document being read. When 
people read text, they often make annotations to high- 
light interesting or controversial passages and terms. 
The presence or relative density of such marks and 
scribbles may be u sed as an indcator of the relative in- 
leresi that the reader has in a particular passage. When 
a largo txxty of documents related to the document be- 
ing read is available, the reader may be interested In 
finding related documents as part of the reading proc- 
ess. 

[0012] References to documents related to specSic 
passages of interest to the user are placed in the source 
document's nrutrgins Bn6 references to documents sim- 
ilar overall to the source document are ir^erted as end 
r>otos. The system and method of this invention maintain 
the links once they have been identlfiedto tacilitate non- 
linear reading arxi skimming. 

[001 3] A user's interests are interred Irom annotations 
made white readng the source document. Theretore. 
the system and method of this biventksn minimize cog- 
nilfve overhead in two ways: 1 ) rx> expressive query is 
required to identify documents rented to the source doc- 
ument; and 2) selectable Bnks to the related documents 
are provided unobtrusively in the margins and at the end 
of the document, this is shown in Figs. 2 arKf 3. respec- 
tively. 

[0014] The system also Introduces suggestions to the 
reader in a manner compati'ble with other interactions, 
rather than burdening the user with modal diak)gues. 
Suggested documents are accessible lyy folk>wing the 
selectable Gnks. However, the user does not have to act 
on a suggestion when it is made. Ftather. the user can 
act on the suggestion when (or H) it makes ser>se to do 
so. The system and method ci this inventk>n represent 
the type of the referenced document with an icon and 
provkto a textura) label to the \con to give users a better 
understanding of the target of the link. 
(OOiq These and other features and advantages of 
this Invention are described n or apparent from the lol- 
towlng detailed description of the preferred embodb 
ments. 

[0016] The preferred emtxxtiments oi this invention 
will be descrft>ed n detail, with refereruxe to the toftowing 
figures, wtterein: 

Fig. 1 is a bk)ck diagram of or>e embodiment of the 
electronic document reading system of this tnverv 
tion; 



Fig. 2 shows a source document having an icon in 
the margin adjacent to an annotated passage; 
Fig. 3 shows another source document having an 

endnote; and 

s Fig. 4 is a flowchart outlining a control routine for 
one embodiment of this inventbn. 

[0017] Fig. 1 shows a bkxk diagram of one embodi- 
ment of a document reading system 10 according to this 

10 invention. The document reading system 10 includes a 
processor 1 2 communicating with a first memory 14 that 
stores a source document 1 6 that is currently being read 
by a user on a display IB. The processor 12 also com- 
municates with a second rhemory 20 dial stores poten- 

IS tially related target documents 22. A user interacts and 
controls the document reading system 10 through any 
number of conventkxval input/output devices 24. such 
as a mouse 26, a keyboard 28, or a pen-based interlace 
30. The input/output devices 24 communicate wKh an 

20 input/output interface 31 that, in turn, communteates 
with the processor 1 2. 

[0018] As shown in Fig. 1 . the system 10 is preferably 
implemented on a programmed general purpose com- 
puter. However, the system 1 0 can also be implemented 
using a special purpose computer, a progranrvned mi- 
croprocessor or microcontroBer artd any necessary pe- 
ripheral integrated circuit elements, an ASIC or other in- 
tegrated circuit, a hardwired electronic or logic circuit 
such as a discrete element circuit, a programmable logc 

30 device such as a PLD. PIA, FPGA or PA1_ or the like. 
In general, any devce on which a finfte state machine 
capable of implementing the ftowchart shown in Fig. 4 
can tie used to implement the system 10. 
[0019] Additionally, as shown 'm Fig. 1 . the storage de- 

3S vices or memories 1 4 and 20 are preferably implement- 
ed using static or dynamic RAM. However, the devices 

14 and 20 can also be implemented using a floppy disk 
and disk drive, a writable optical disk and disk drive, a 
hard drive, flash memory or the like. Also, it shouU be 

^ appreciated that the devices 14 and 20 can be either 
dtsthd portbns of a single memory or physicaBy distinct 
memories. 

[0020] Further, it shoukf be appreciated that the Qrtks 

15 and 17 connecting the devfees 14 and 20 and the 
^ processor 12can be a wired or wireless link toa network 

(not shown). The rtetwork can be a tocal area nefwoiK 
a wfcle area network, an Intranet, the Internet or any oth- 
er dtetr9}uted processing and storage networic In ttus 
case, the electronic document 16 is pulled from and 

50 physically remote memory devfce 14 through link 15 for 
processing in the processor 1 2 according to the method 
outlined bekvw. In thb case, the electronic document 16 
can be stored kxally in portnn of some other trtemoiy 
devbe of the system 10 (not shown). 

ss [0021] The method of this invention identifies two 
kirKis of target documents 22 for each source document 
1 6. The two types of target documents arc: 1 ) target doc- 
uments that are specifically related to annotated pas- 
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sages; and 2) documents that are generaiV related to 
the oveiati source document. Once a relationship is es- 
tablished between the sotirce document and the target 
documents 22. the target documents may be displayed 
by clbking on selectable links in the displayed dociimertt 
16. 

[0022] Relerences to the two types o! target docu- 
ments 22 K shown In Fig. 2. A target document 22 re- 
lated to the speciftc passage 32 in the source document 
16 is idenlffied by a margin representai'ion 34 placed in 
the margin of the source docuntent 16 near the related 
passage 32. As shown m Fig. 3. a target document 22 
that is related to the source document 16 as a whole is 
annotated and shown as an endnote 36 to the source 
document. The end rK>t6 36 includes the type, the title 
and summary informatkyi. 

[0023] Fig. 4 is a flowchart outlining a control routine 
lor one embodknent of the method of this invention. Be- 
ginnlr^ \n step SlOO, the control routine continues to 
step 5105. In step SI 05. the control routine determines 
il the user has made any annotations. U not, control 
loops t3ack to step SI 05. II so control continues to step 
S1 10. In step S1 10. the control routine deteimtnes the 
annotation of the source document mode by the user 
Next, in step 8120. the control routir>e arizes the text 
of the source document arKf the annotation to determine 
the passage being armotated. A passage may include 
a paragraph marked with a margin bar, an underlying 
sentence or phrase, or the context clone or more circled 
terms. Then in step Si 30, the control routine generates 
a query from the passage. The query includes content- 
t>earing terms from the identified passage that are 
weired to give rnportar>ce to any cirded words. Next, 
in step SI 40 the control routine searches the target doc* 
ument using the query to identity documents that are 
related to the passage. Then, at step Si 50, the search 
results are clustered. Clustering is preferably perl omied 
in a martner similar to that described in "Reexaminirtg 
the Cluster Hypothesis; Scatter/Gather on Retrieval Re- 
sults*. M.A Hearst et al.. Proceednos of ACM SIGIR 
'96. August 1996. Zurich, Switzerland, incorporated 
herein by reference in its entirety. 
[0024] Next. In step Si 60. the control routine selects 
a typical document from each cluster. These documents 
are further filtered by a user-specified similarity thresh- 
old In step Si 70. Then, in step Si 80. the rermUnIng doc- 
uments are IdenlBied by dlsptaylng links to those docu- 
rr^ents in the margin of the source document adjacent to 
the passage from which the query was generated. Each 
setectabiB link may be an con representing a type of the 
selected and filtered target document and a short title. 
[002S] Next, in step SI 90, the control routine deter- 
mbnes 9 a user has selected a selectable link in the cur- 
rent source document. If in step SI 90, a user has se- 
lected a selectable link, the control routir>e proceeds to 
step S200. In step S200. the target document is dis- 
played as the new current source document, control 
then continues back to step SI 05. where it waits for arv 



other annoiation to be made. Alternative ty, if in step 
8190. no selectable link is selected, then the control 
jumps directly back to step S105. The control routine 
continues until the user has ck>sed all open source doc- 
5 umenls 16 displayed on the display 18. 

[0026] To compute end notes the flowchart of Fig. 4 
can be used with slight modincatiorw. The control rou- 
tine proceeds identically as directed for the creation of 
margin rKStes from step SI 00 through step Si 20. How- 
10 ever, at step SI 30 a weifi|hted sum query is generated. 
In step SI 30 terms thai are explicitly klentified by the 
reader and terms identified by standard relevance feed- 
back techniques are used to construct weighted-sum 
queries at step SI 30. The identiTied terms are assigned 
weights based upon the annotattons made to the docu- 
ment For nslance, words that have been expressly se- 
lected by the user are weighted the highest and wofds 
that occur in selected paragraphs are weighted higher 
tt^n the remaining terms of the source document. 
^ [0027] Documents that have t>e8n identified as relat- 
ed to the document using the weighted sum query gen- 
erated in step SI 30 are processed in a manner similar 
to the remaining steps SI 40 through S200 with the ex- 
ception that the fink is displayed as an end note in step 
2S 81 80 rather Ifian as a margin rx>te. 

[0028] It should be understood that either or both of 
these control routines nrtay be running 'm the t>ackground 
of a document reacfing system of the inventton. 
[0029] Optioruilly. the system and method of this ir>- 
30 ventbn may derive summaries from doctjments throu^ 
an automatic text summarization pnxess In a manner 
similar to that descrft>ed \n *A Trainable Document Sunv 
marizer*. J. Kupiec et al. , Proceedings of 51018*95. July 
1995. PittsbUTG^. PA. ACM Press. irKorporated herein 
3S t3y retererKe in its entirety. The stimmarles are then dis- 
played as end notes. 

[0030] It Is to be understood that the term annotation 
- as used herein is intended to include text, digital ink. 
audio, video or any other input associated with a docu- 
ment. It is also to be urtderstood that the term document 
b intended to ffx:lude text, video, audk> arxl any other 
ntedia and any corr^bination of mecSa. Further, it is to be 
understood that the term taxi Is htended to include text, 
digital vtK aucfio, vkloo or any other content of a docu- 
45 niertt to include the doctjment's structure. 



Claims 

so 1. A method for displaying in a display of a first docu- 
nrtent, at least one link to another document, each 
other document being related to the first document, 
the method comprieing: 

ss kJentilying at least one user annotated segment 

of the first document; 

identifying at least one second documem that 
is related to the at least one anruitated segment 
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of the frrst document; and is responsive to the annotation of a segment of the 

displaying in the first document a se lectable link first document by the user to identify the at least one 

(or each second document. second document. 



2. Iba method of claim 1» wherein the selectable link s 
is displayed as en end note to the first document. 

3. The method of claim 1 or claim 2. the step of iderv 
tilying Ihe at least one second document comprising 
identifying at least one portion of Ihe at least one io 
secor^d document as related to the at least one arv 
Rotated segment, the selectable link re1erencir>g the 
identified at least one portion, the selectable link be- 
ing displayed in the margin adjacent to the at least 
one armotated segment and the step of Uentifying *^ 
being in response to the annotation of the at least 
one segment of the fiTst document 

4. The method of any one of claims 1 to 3. wherein the 
step of identifying the at least one second document ^ 
comprises determining the relatedness based upon 
user kjentiried terms and terms identified using rel- 
evarK:e feedback techniques. 

5. The method of any one of clakns 1 to 4. further com- 2S 
prising the steps dt. 



10. The system of any one ol claims 6 to 9. further com- 
prising a user mtertace, wherein the display is re> 
sponsive to the selection of the selectable iBik by 
the user to display the identified at least one second 
document. 



detenmirwig if the selectable link has been se> 
leded: and 

displaying the identified at least one second 30 
document in response to the selection of the 
selectable link. 



6. An electrons document system for suggesting in a 
display of a first document at least one second doc- 3S 
ument that is related to lha first document, the sys- 
tem comprising^ 

a processor that identifies at least orw user an- 
notated segment of the first document and that 
kfentlRes al least or>e second document as re- 
lated lo the annolated segment of the first doc- 
ument; and 

a display that cflsplays a selectable link that ret- 
erences the identified at least one second doc- ^ 
umenft in a display ol the first documenL 

7. The system ol daim 6. wherein Ihe selectable link 
is displayed as an end note to the first document. 

50 

B. The system of any one of clairrts 6 aruJ 7, vvherein 
the processor identSies the at least one second doc- 
ument based upon user ktentified terms arxl temis 
ident0ied based upon relevartco feedback tech- 
niques. S5 

9. The system of any one of claims 6 to 8. further oorw 
prising a user input interface, wherein the processor 
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